Patterns of research collaboration in a digital library for Eco

نویسندگان

  • Nisa Bakkalbasi
  • Thomas Krichel
چکیده

RePEc (Research Papers in Economics) has been conceived and developed to promote scholarly communication and to enhance the dissemination of research findings in the field of economics. RePEc offers the RePEc Author Service (RAS) where economics authors can claim authorship of the research papers that are described in RePEc archives. The data from this service forms a high-quality authorship database. We investigate the structure of research collaborations within RePEc by applying social network analysis to the co-authorship network formed by the RAS registrants. We perform a component size analysis and calculate centrality metrics. Our findings imply that the RAS registrant population is made up of highly active academics that are well connected to each other. In addition, RAS registrants appear to have a broad range of coauthors, with most individuals having only a few coauthors, whereas a few have many. We compare and contrast results from a number of recent studies of similar scope on coauthorship networks. Introduction and motivation Even though the roots of present day digital libraries may be traced to the information retrieval systems of the 1960s, digital libraries, as we know them today, have emerged only in the past decade to serve different constituencies and communities (Fox & Urs, 2002). RePEc (Research Papers in Economics), one of the earliest digital libraries in existence, has been conceived and developed to document scholarly communication and enhance the dissemination of research findings in the field of economics (Walshe, 2001). RePEc is a collaborative effort of hundreds of volunteers in 51 countries. As of this writing, RePEc describes over 362,000 items of interest such as working papers, journal articles, software components, and instructional datasets. All RePEc data, freely available online, are contributed by academic departments, institutions involved in economics research (e.g. central banks), publishers, and individuals. Barrueco Cruz and Krichel (2001), early pioneers of RePEc, provide a detailed discussion of their approach towards building the digital library and summarize the basic principles. RePEc is based on three concepts: “archive,” “site,” and “service.” A service is defined as a “rendering of RePEc data in a form that is available to the end user.” The definition can be expanded to include activities such as mining the archive data, discovering potentially useful information about the users, and presenting it in a form that is meaningful to the research 1 We are grateful to the Open Society Institute for funding to support the ACIS software system, see http://acis.openlib.org. The data used in this study has been gathered through a running version of the software. 2 http://repec.org community. To facilitate some of these activities, RePEc provides several innovative user services: • “NEP: New Economics Papers,” at http://nep.repec.org, is a human-mediated current awareness service. • “LogEc,” at http://logec.repec.org, gives detailed access statistics for RePEc items and authors. • “CitEc,” at http://citec.ier.hit-u.ac.jp, gives citations from items in the RePEc database. • The “RePEc Author Service,” at http://authors.repec.org, is an author registration service. This study focuses on the RePEc Author Service (RAS), a user service that maintains personal data, which enables analysis related to the association between authors and documents. In all bibliometric studies, the availability of complete and correct authorship data remains a significant obstacle in obtaining statistically accurate results. Although bibliographic data contain author names, a name does not necessarily identify an author uniquely. A number of factors contribute to the variety of ways author names appear in standard bibliographic data. Several authors may share the same name, a name can be written in various forms and the use of initials, accents or capitalizations may be inconsistent. To illustrate, here is an example of three names referring to the same author: Phillips, P C B Peter C.B. Phillips Peter Phillips Additional problems are introduced by the transliterations from non-Roman to Roman alphabets. (e.g., Chaykovskiy, Tchaikovsky). In general, raw author name data have to be put through a long and arduous cleaning process before they can be used for authorship analyses. The RAS provides utilities for an author to register, provide a profile and codify his/her own authorship data in its database. The RAS database, therefore, contains high-quality authorship data for analysis. This paper analyzes the patterns of authorships and incidence of collaborative relationships among all RAS registrants. The remainder of the paper is organized as follows. In section two we discuss the RAS in more detail. In particular, we assess its completeness. In section three we study co-authorship among RAS registrants using social network analysis (SNA) methodology. In section four we provide our conclusions and suggest future work. The RePEc Author Service (RAS) Overview and operation To the best of our knowledge, the RAS is a one-of-a-kind service with much potential in cultivating collaboration and cooperation among “digital library” users, promoting information sharing and creating a sense of community in a digital environment. Each author registers himself/herself voluntarily and creates a professional profile. Then, the author provides contact information, affiliation, and publications. Following is an extract of a record from the RAS database: Template-Type: ReDIF-Person 1.0 Name-First: Christian Name-Last: Zimmermann Name-Full: Christian Zimmermann Workplace-Organization: RePEc:edi:deuctus Email: [email protected] Homepage: http://ideas.repec.org/zimm/ Author-Paper: repec:cre:crefwp:33 Author-Paper: repec:mtl:montde:2000-05 Author-Software: repec:dge:qmrbcd:99 Author-Software: repec:dge:qmrbcd:97 Author-Paper: repec:uct:uconnp:2005-01 Author-Article: repec:eee:jcecon:v:33:y:2005:i:1:p:88-106 Author-Article: repec:eee:jmacro:v:26:y:2004:i:4:p:637-659 Author-Paper: repec:sce:scecf5:372 Author-Paper: repec:red:sed005:561 Short-Id: pzi1 Handle: repec:per:1964-12-14:christian_zimmermann Last-Login-Date: 2005-11-21 15:25:20 -0500 Registered-Date: 2004-02-29 17:36:09 –0600 Figure 1 provides a screenshot of a web page that renders this data on the web. Figure 1. Author web page for Christian Zimmermann from the EconPapers service. Registered authors receive a monthly notification, which includes statistics on abstract views and downloads of their works. Assessment of RAS Data In order to assess document and author coverage in RAS, we perform basic statistical analysis on two databases: RePEc and RAS. RePEc contains 362,080 documents. RAS contains 124,447 documents that have been claimed by at least one registered author. This observation indicates that roughly one in three papers in RePEc is included in the RAS database. Furthermore, the number of authorships in RAS is 152,072, whereas the corresponding number is 609,225 in RePEc, suggesting that about one in four authorships in RePEc are covered in the RAS. At a disaggregated level, we examine how many registered authors have claimed to be an author of a paper. Table 1 furnishes a list of the number of authors on a paper and the corresponding number of papers in the entire RePEc database vs. RAS database. RePEc contains 180,716 (49.91%) documents with a single author, whereas RAS contains 99,562 (80%) documents with one registered author. It is quite possible that a document may have more than one author, but only one may have registered in the RAS. In RePEc, there are 1.68 authors per document whereas in RAS, there are 1.22 authors per document. While the maximum number of authors for a single document reaches 24 in RePEc database, this number remains at a maximum of 8 in the RAS database. Table 1. Distribution of the number of authors per paper in RePEc and RAS Number of authors Number of papers RePEc RAS 1 180716 (49.91%) 99562 (80.00%) 2 129638 (35.80%) 22315 (17.93%) 3 42427 (11.72%) 2425 (1.95%) 4 7021 (1.94%) 130 (0.10%) 5 1338 (0.37%) 9 (0.01%) 6 425 (0.12%) 4 (0.00%) 7 193 (0.05%) 1 (0.00%) ≥ 8 99 (0.03%) 1 (0.00%) Table 2 provides basic statistics for RAS registrants. Of the 12,381 registrants in the RAS database, 8,666 have claimed at least one document while the remaining 3,715 have claimed none. It is difficult to explain why some individuals have registered for the RAS but not claimed any papers. Therefore, such records are excluded from further analysis. The average number of papers per registered author is 17.55, which is significantly higher than the average number of papers per author in various academic fields reported in other similar studies. For example, Newman (2004b) reports that the average number of papers per author is 6.4 in biology, 5.1 in physics, and 6.9 in mathematics based on the number of authors in Medline, Physics E-print archive, and Mathematical Reviews, respectively. There are possible explanations for the difference. First and foremost, RAS registrants are likely to be active in research and publication, evidenced by their interest in the service, and hence more prolific than an average author found in an indexing and abstracting database. Second, RePEc covers both journal articles and working papers, and might contain both the working version(s) and formally-published form of a document. Finally, it is likely that the prior studies underestimated the averages due to the difficulties outlined above in reliably distinguishing variations of the name of the same author. Table 2. Summary statistics for RAS registrants Number of RAS registrants 12381 Number of registrants who did not claim a paper 3715 Number of registrants who claim at least one paper 8666 Number of authorships 152072 Average number of papers/author 17.55 Figure 2 provides the frequency distribution of authors by number of documents they claim. A close examination of data reveals a tendency for authors who have claimed a large number of documents to register with RAS. In particular, three RAS registrants claim over 300 documents, fifteen registrants claim between 200 and 300 documents, and so on. It can be seen from the data that the percentage of authors with two or more articles is 87%, the authors with three or more articles is 76%, the authors with four or more articles is 69%. Figure 2. Frequency distribution of authors by number of documents 87% 76% 69% 63% 58% 54% 50% 42% 30% 20% 10% 2 3 4 5 6 7 8 11 17 26 45 Number of documents P er ce nt eg e of a ut ho rs According to the well known Lotka’s Law of scientific productivity, about 60% of the authors publish only once (Egghe, 2005), which is not the case here. Using the software program LOTKA (Rousseau and Rousseau, 2000), we test whether Lotka’s Law is valid for RAS registrants. We find that: 3927 . 1 3174 . 0 ) ( y y f = where f(y) denotes the relative number of authors with y publications. According to Kolmogorov-Smirnov statistic (Dmax=0.2022), the observed author publication frequencies cannot be described by the Lotka distribution. This indicates that interest in RAS is more wide spread among researchers who are highly productive. Prolific authors also act as hubs in the co-authorship network, as we will demonstrate in our analyses. RAS co-authorship analysis In this section, we investigate the structure of research collaborations within the co-authorship network of RAS using social network analysis methodology. Introduction to social network analysis Social network analysis is concerned with understanding the linkages among social entities and the implications of these linkages (Wasserman & Faust, 1994). A social network consists of a set of social entities and the relations defined on them, referred to as "actors" and "ties" respectively. A network can be presented as a graph, which consists of points (or nodes) to represent actors and lines (or edges) to represent ties or relations. An example of a social network diagram is given below. Figure 3. A social network diagram Co-authorship network studies In recent years, there has been a vivid interest in the co-authorship networks to examine the patterns of collaborations within an academic community and determine the status and influence of individual researchers. Recent studies of research collaboration in a number of academic fields demonstrate that social network analysis plays a critical role in determining how well-connected a research community is. The academic fields range from physics, biomedical research, mathematics, computer science (Newman, 2004a; Newman, 2004b), neuroscience (Barabasi, Jeong, Neda, Ravasz, Schubert, & Vicsek, 2005), to digital library research community (Liu, Bollen, Nelson, & Van de Sompel, 2005) and economics (Goyal, van der Leij, Moraga-Gonzalez, 2004) and information sciences (Otte & Rousseau, 2002) RAS co-authorship network In the co-authorship network model for this study, nodes represent authors and an edge connects two authors if they coauthored one or more papers. We build a binary graph representing a nondirectional dichotomous relationship, that is a tie is either present or absent between each pair of authors. In this study, we do not take the frequency of coauthorship between authors into account. We plan to investigate weighted co-authorship networks in the future. Table 3 shows the summary statistics for RAS co-authorship networks. Table 3. Summary statistics for RAS authors and co-authorship networks Number of authorships by co-authors 137550 Number of authors with at least one co-author 5661 Number of authorships with at least one co-author 109924 Average number of collaborators/co-author 2.05 Size of the largest component 4659 Number of components 382

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Level of Observing the Evaluation Criteria for User Interface in library services providing to the blind and deaf users in the word

Purpose: Digital library user interfaces has a determining role in desirable performance of this kind of libraries. Digital Library service providers to the blind and deaf users will have their best performance when the users (deaf and blind users) could have a proper interaction with them. This study aims to evaluate and analyze the criteria related to user interface in digital libraries servi...

متن کامل

شاخص های طراحی و ارزیابی کتابخانه های دیجیتالی

Introduction: There was always suspicion regarding concept and frameworks of digital libraries concepts such as electronic library, virtual library, without wall library, hybrid library and digital library have applied often together, or for each other for conveying library concept. Studies have shown that so far there is no standard and universal accepted definition for digital libraries, howe...

متن کامل

کتابخانه‌ی ملی دیجیتال پزشکی ایران(INMDL) : بایدها و نبایدها

Iran National Digital Library of Medicine was launched in 2008 by Shahid Beheshti University of Medical Sciences in order to supply English language scientific resources for the Universities of Medical Sciences throughout the country. The Library could be accessed via www.inlm.org. Given the academic definition for national and digital libraries, it seems that the services and resources offered...

متن کامل

ارزیابی کتابخانه دیجیتال دانشگاه علوم پزشکی تهران با استانداردهای ساختار کتابخانه دیجیتالی دانشگاهی

Introduction: Spite of many studies conducted on digital libraries, there are a few studies on the evaluation of this type of library. The present study was an attempt to determine similarities and differences between Tehran University of Medical Sciences Digital Library against the Structural Standards of an academic digital library. Methods: This was an observational study in which the dat...

متن کامل

Agora: Enhancing Group Awareness and Collaboration in Floristic Digital Libraries

Digital libraries can be regarded as virtual spaces in which collaborative scholarly research can be conducted. Floristic digital libraries provide such collaboration spaces for scientists working on solutions for Earth’s biodiversity problems. However, group awareness and collaboration are not easily achieved in an increasingly distributed environment such as the virtual space in which digital...

متن کامل

Collaborative Ownership in Cross-Cultural Educational Digital Library Design

This paper details research into building a Collaborative Educational Resource Design model by investigating two contrasting Kenyan / UK design case-studies and an evaluation of end-users and designers’ perceptions of digital libraries and their usage patterns. The two case-studies compared are; case study 1 based on formal learning in an African university digital library. Case study 2 is cent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006